Document page classification algorithms in low-end copy pipeline

نویسندگان

  • Xiaogang Dong
  • Kai-Lung Hua
  • Peter Majewicz
  • Gordon McNutt
  • Charles A. Bouman
  • Jan P. Allebach
  • Ilya Pollak
چکیده

bstract. We develop real-time, low-complexity image classificaion algorithms suitable for a copy mode selector embedded in a ow-end copier. The algorithms classify scanned images repreented in RGB or in an opponent color space. Classes are the eight ombinations of mono/color and text/mix/picture/photo. Classificaion is 30–98% accurate with misclassifications tending to be beign. The algorithms provide for improved copy quality, a simplified ser interface, and increased copy rate. © 2008 SPIE and IS&T. DOI: 10.1117/1.3010879

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DOCUMENT PAGE CLASSIFICATION AND NONLINEAR DIFFUSION FILTERING FOR IMAGE SEGMENTATION AND NOISE REMOVAL A Dissertation

Dong, Xiaogang Ph.D., Purdue University, May, 2007. Document Page Classification and Nonlinear Diffusion Filtering for Image Segmentation and Noise Removal. Major Professor: Ilya Pollak. We develop a real-time, strip-based, low-complexity document page classification algorithm, which can be used as a copy mode selector in the copy pipeline. It analyzes the scan images and classifies them into o...

متن کامل

Persian Printed Document Analysis and Page Segmentation

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...

متن کامل

Automatic Web Page Classification

Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...

متن کامل

A Patient-Centric SNV-CNV Pipeline

This application note outlines the Illumina methodology for estimating DNA copy number for data produced on Affymetrix Genome Wide Human single nucleotide polymorphism (SNP) 5.0 and 6.0 arrays on the BaseSpace Correlation Engine. Within a patient-centric context, data are obtained for an individual patient rather than a batch. Also, patient data are often supplied without a matching reference. ...

متن کامل

Noise reduction through summarization for Web-page classification

Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the perfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Electronic Imaging

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2008